167 research outputs found
Incremental Entity Resolution from Linked Documents
In many government applications we often find that information about
entities, such as persons, are available in disparate data sources such as
passports, driving licences, bank accounts, and income tax records. Similar
scenarios are commonplace in large enterprises having multiple customer,
supplier, or partner databases. Each data source maintains different aspects of
an entity, and resolving entities based on these attributes is a well-studied
problem. However, in many cases documents in one source reference those in
others; e.g., a person may provide his driving-licence number while applying
for a passport, or vice-versa. These links define relationships between
documents of the same entity (as opposed to inter-entity relationships, which
are also often used for resolution). In this paper we describe an algorithm to
cluster documents that are highly likely to belong to the same entity by
exploiting inter-document references in addition to attribute similarity. Our
technique uses a combination of iterative graph-traversal, locality-sensitive
hashing, iterative match-merge, and graph-clustering to discover unique
entities based on a document corpus. A unique feature of our technique is that
new sets of documents can be added incrementally while having to re-resolve
only a small subset of a previously resolved entity-document collection. We
present performance and quality results on two data-sets: a real-world database
of companies and a large synthetically generated `population' database. We also
demonstrate benefit of using inter-document references for clustering in the
form of enhanced recall of documents for resolution.Comment: 15 pages, 8 figures, patented wor
Multi-Sensor Event Detection using Shape Histograms
Vehicular sensor data consists of multiple time-series arising from a number
of sensors. Using such multi-sensor data we would like to detect occurrences of
specific events that vehicles encounter, e.g., corresponding to particular
maneuvers that a vehicle makes or conditions that it encounters. Events are
characterized by similar waveform patterns re-appearing within one or more
sensors. Further such patterns can be of variable duration. In this work, we
propose a method for detecting such events in time-series data using a novel
feature descriptor motivated by similar ideas in image processing. We define
the shape histogram: a constant dimension descriptor that nevertheless captures
patterns of variable duration. We demonstrate the efficacy of using shape
histograms as features to detect events in an SVM-based, multi-sensor,
supervised learning scenario, i.e., multiple time-series are used to detect an
event. We present results on real-life vehicular sensor data and show that our
technique performs better than available pattern detection implementations on
our data, and that it can also be used to combine features from multiple
sensors resulting in better accuracy than using any single sensor. Since
previous work on pattern detection in time-series has been in the single series
context, we also present results using our technique on multiple standard
time-series datasets and show that it is the most versatile in terms of how it
ranks compared to other published results
Impact of distractors in item analysis of multiple choice questions
Background: Item analysis is a quality assurance of examining the performance of the individual test items that measures the validity and reliability of exams. This study was performed to evaluate the quality of the test items with respect to their performance on difficulty index (DFI), Discriminatory index (DI) and assessment of functional and non-functional distractors (FD and NFD).Methods: This study was performed on the summative examination undertaken by 113 students. The analyses include 120 one best answers (OBAs) and 360 distractors.Results: Out of the 360 distractors, 85 distractors were chosen by less than 5% with the distractor efficiency of 23.6%. About 47 (13%) items had no NFDs while 51 (14%), 30 (8.3%), and 4 (1.1%) items contained 1, 2, and 3 NFDs respectively. Majority of the items showed excellent difficulty index (50.4%, n=42) and fair discrimination (37%, n=33). The questions with excellent difficulty index and discriminatory index showed statistical significance with 1NFD and 2 NFD (p=0.03).Conclusions: The post evaluation of item performance in any exam in one of the quality assurance method of identifying the best performing item for quality question bank. The distractor efficiency gives information on the overall quality of item
Predicting Remaining Useful Life using Time Series Embeddings based on Recurrent Neural Networks
We consider the problem of estimating the remaining useful life (RUL) of a
system or a machine from sensor data. Many approaches for RUL estimation based
on sensor data make assumptions about how machines degrade. Additionally,
sensor data from machines is noisy and often suffers from missing values in
many practical settings. We propose Embed-RUL: a novel approach for RUL
estimation from sensor data that does not rely on any degradation-trend
assumptions, is robust to noise, and handles missing values. Embed-RUL utilizes
a sequence-to-sequence model based on Recurrent Neural Networks (RNNs) to
generate embeddings for multivariate time series subsequences. The embeddings
for normal and degraded machines tend to be different, and are therefore found
to be useful for RUL estimation. We show that the embeddings capture the
overall pattern in the time series while filtering out the noise, so that the
embeddings of two machines with similar operational behavior are close to each
other, even when their sensor readings have significant and varying levels of
noise content. We perform experiments on publicly available turbofan engine
dataset and a proprietary real-world dataset, and demonstrate that Embed-RUL
outperforms the previously reported state-of-the-art on several metrics.Comment: Presented at 2nd ML for PHM Workshop at SIGKDD 2017, Halifax, Canad
Pulmonary Rehabilitation in Chronic Obstructive Pulmonary Disease
With an ever-expanding understanding about chronic obstructive pulmonary disease (COPD), it has been realized that it is a respiratory disease with systemic manifestations. Systemic effects of COPD lead to cardiovascular co-morbidities, muscle wasting and osteoporosis that in turn lead to inactivity and physical deconditioning. This development has a direct impact on the health-related quality of life (HRQoL) of patients suffering from this respiratory disease. Pharmacological therapy leads to improvement in shortness of breath and has limited effect on the physical deconditioning. Latest research has shown an additive effect of pulmonary rehabilitation on improving the inactivity and overall HRQoL in COPD patients. Pulmonary rehabilitation (PR) is a comprehensive multimodality program that includes strength and endurance training, nutritional education and psychosocial support. This leads to a holistic approach to management of COPD which results in symptom improvement in patients and decreased utilization of heath care resources. There are several barriers to widespread adoption of pulmonary rehabilitation as a standard treatment. This includes availability, insurance coverage and patient compliance. With inclusion of pulmonary rehabilitation in respiratory society guidelines, there has been a renewed interest among both pulmonary specialist and community physicians. This chapter aims to provide exhaustive evidence based knowledge regarding pulmonary rehabilitation and its beneficial effect on COPD patients
- …